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Abstract 



In manifold learning, algorithms based on graph Laplacians constructed from data have received 
considerable attention both in practical applications and theoretical analysis. In particular, the 
convergence of graph Laplacians obtained from sampled data to certain continuous operators has 
become an active research topic recently. Most of the existing work has been done under the 
assumption that the data is sampled from a manifold without boundary or that the functions of 
interests are evaluated at a point away from the boundary. However, the question of boundary 
behavior is of considerable practical and theoretical interest. In this paper we provide an analysis 
, of the behavior of graph Laplacians at a point near or on the boundary, discuss their convergence 

^ ^ ■ rates and their implications and provide some numerical results. It turns out that while points near 

• I the boundary occupy only a small part of the total volume of a manifold, the behavior of graph 

Q . Laplacian there has different scaling properties from its behavior elsewhere on the manifold, with 

global effects on the whole manifold, an observation with potentially important impUcations for the 
general problem of learning on manifolds. 

^ 1 Introduction 

m ; 

On , Graph Laplacian constructed from data points is a key element in many machine learning algorithms including 

CO ■ spectral clustering, e.g., (von Luxburg, 2007), semi-supervised learning (Zhu, 2006; Chapelle et al., 2006) and 

ly-j I dimensionality reduction (Belkin & Niyogi, 2003), as well as a number of other applications. A large amount 

. of work in recent years has been centered on analyzing various theoretical aspects of graph Laplacians on 

manifolds, and, in particular, on their different modes of convergence, when the data goes to infinity and/or 
the parameters, such as kernel bandwidth, tend to zero (Belkin, 2003; Lafon, 2004; Hein, 2005; Coifman 
& Lafon, 2006; Singer, 2006; Gine & Koltchinskii, 2006; Hein et al., 2007; Belkin & Niyogi, 2008; von 
Luxburg et al., 2008; Rosasco et al., 2010). A typical result in that direction shows that the discrete graph 
I Laplacian converges' to the Laplacian-Beltrami operator on manifolds when the bandwidth parameter of 

H ■ the kernel is chosen as an appropriate function of the number of data points. These results help to clarify 

our understanding of the underlying objects, to shed light on properties of the algorithms and to guide the 
selection of algorithms in practical applications. 

For example, an analysis of normalized versus unnormalized Laplacians in (von Luxburg et al., 2008)) 
suggests that normalization may be preferable in practical applications. In another example, the estimators 
of several graph Laplacian based semi-supervised learning algorithms had recently been shown to converge 
to constant solutions in the limit of infinite unlabeled points while fixing labeled points (Nadler et al., 2009), 
suggesting the use of iterated Laplacians (Zhou & Belkin, 201 1), which indeed shows superior performance 
in practice. 

The spectral convergence of a graph Laplacian is another important limit analysis of the graph Laplacian, 
which links directly to applications. The empirical spectral convergence of spectral clustering when the 
sample size n goes to infinity for a fixed kernel bandwidth t was studied by (von Luxburg et al., 2008), while 
the spectral convergence of a graph Laplacian to the Laplace-Beltrami operator when the kernel bandwidth t 
goes to zero as n goes to infinity is studied in (Belkin & Niyogi, 2007). 

However, most previous results on graph Laplacians deal with the setting where the manifold does not 
have a boundary or when the operator is analyzed at a point away from the boundary. Arguably, it is a 



'Different modes of convergence are possible here, such as different types of pointwise or uniform convergence or 
convergence of eigenvectors. 



significant short-coming of these analyses, since manifolds or domains with boundary are present explicitly 
or implicitly in many problems of significant interest in data analysis. Perhaps the simplest example is the 
fact that the pixel intensity of a gray-scale image cannot be smaller than zero, providing a natural boundary 
condition for any image manifolds. A more interesting example is in motion analysis, where the manifold of 
configurations of a human or robot body (perhaps embedded using video images or data from sensors attached 
to limbs) has boundaries corresponding to the limits for the range of motions of each individual joint. More 
generally, it is natural to think that boundaries in data are present whenever the generating process itself is in 
some way constrained. It is clear that if such manifolds are to be learned from data, the boundary behavior 
cannot be disregarded. 

In the current paper we discuss the boundary behavior of graph Laplacians by analyzing the graph Lapla- 
cian convergence at the boundary. We show that the graph Laplacian at the boundary converges to a gradient 
operator in the direction normal to the boundary, when the bandwidth parameter t is chosen adaptively as a 
function of the number of data points. We provide explicit bounds for the convergence. One of the key results 
of our analysis is that both the behavior and the scaling of the graph Laplacian near the boundary is quite dif- 
ferent from that in the interior of the manifold. Specifically, for a fixed function f{x) and a small bandwidth 
parameter t the (appropriately scaled) graph Laplacian will be close to the Laplace-Beltrami operator Af{x) 
on interior point x, while at the boundary the same object will be close to the normal derivative -^dn.f{x). 

We see that the large values of the graph Laplacian applied to a fixed function are likely to correspond to the 
boundary points. Moreover, the analysis shows that while there are few points near the boundary of a mani- 
fold, their influence on the graph Laplacian is disproportionately large and cannot be ignored. This suggests 
that the boundary has a global effect on the graph Laplacian, a finding that is confirmed by our numerical 
experiments provided in the paper Viewed in a different way it suggests that for algorithms when a graph 
Laplacian is used as a regularizer, as is the case in many applications, bounding the norm would lead to 
the suppression of the large values near the boundary. Thus the minimizer of the regularization problem (or 
similarly, the eigenvectors) should satisfy the Neumann boundary conditions, i.e., be nearly constant in the 
direction orthogonal to the boundary, which is confirmed by our numerical experiments. 

In a related line of investigation we find that the symmetric normalized graph Laplacian has a different 
boundary behavior from the random walk (asymmetric normalized) and unnormalized graph Laplacians. Un- 
like those two, for a fixed function /(x), L*/(x) converges to -^[p{x)Y/'^dn{f{x)/[p{x)]^^'^) foraboundary 

point X, where p{x) is the probability density function. This does not lead to the Neumann boundary condi- 
tion, and seems strange from a practical point of view. 

As a further illustration of the importance of boundary conditions in learning theory, we explore the 
boundary effects for a reproducing kernel in a simple 1 -dimensional example. We also discuss the limit of 
the graph Laplacian regularizer on manifolds with boundary, which cannot be taken for granted to be the same 
as the limit on or manifolds without boundary because of the boundary behavior of graph Laplacians. 

Finally we briefly compare the graph Laplacian built from random samples to the Laplacian on regular 
grids in numerical PDE's. 

1.1 Problem Setting 

We now proceed with a more technical setting of the problem^et be a compact Riemannian submanifold 
of intrinsic dimension d embedded in M^, fl the interior of il, and 50 the boundary of O, which we will 
assume to satisfy the necessary smoothness conditions^. Given n random samples X = {^i, • • • 
drawn i.i.d. from a distribution with a smooth density function on fl such that < a < p{x) < b < oo, 
we can build a weighted graph G{V, E) by mapping each sample point Xi to vertex Vi and assigning a 
weight Wij to edge Cij . One typical weight function is the Gaussian defined as Wij = Kt {Xi ,Xj) = 

l/t'^/'^e~"'^*~'^^"s"/*, which is used in this paper Let the n x n matrix W be the edge weight matrix of 
graph G with W{i,j) — Wij, and _D be a diagonal matrix such that Da — J^j "u^ij^ then the unnormalized 
graph Laplacian is defined as matrix L" 

= D - W (1) 

There are several ways of normalizing L^. For instance, the most commonly used two are the asymmetric 
random walk normalized version L"^ = D~^L^ = / — D~^W and the symmetric normalized version = 

^-l/2^«£,-l/2 ^-l/2yy£,-l/2^ 

Another useful way of building a graph Laplacian is governed by a parameter a such that we first normal- 
ize W as Wa = D^°'WD^°', then define the unnormalized, random walk and symmetric normalized graph 



^Instead of spending several pages to describe these smoothness conditions in this paper, we refer readers to (Belkin, 
2003; Lafon, 2004; Hein, 2005) for more details. 



Laplacians as 

I-D-'Wa (2) 

L^= I - Do^^/^Wc^D-^^^ 
where Da is the corresponding diagonal degree matrix for Wa ■ It is easy to see when a = 0, these graph 
Laplacians become the commonly used ones without the first step normalization. Therefore, for each value 
of a, there are three closely connected empirical graph Laplacians. 

The limit study of graph Laplacians primarily involves the limits of two parameters, sample size n and 
weight function bandwidth t. As n increases, one typically decreases t to let the graph Laplacian capture 
progressively a finer local structure. 

With a proper rate as a function of n and t, the limit of L"/(a;) for a given smooth function and fixed x 
can be shown to be Af{x) when is a compact submanifold of without boundary and p{x) is a uniform 
density. This builds a connection between the discrete graph Laplacian and the continuous Laplace-Beltrami 
operator A on manifolds, which in M'^ can be written as 

? — 1 

This connection is an important step in providing a theoretical foundation for many graph Laplacian based 
machine learning algorithms. For instance, harmonic functions used in (Zhu et al., 2003) for semi-supervised 
learning is in fact a solution of a Laplace equation, with a "point boundary condition" at labeled points. 

The limit of ij^ and its various aspects, including the finite sample analysis, are studied in (BeUcin, 2003; 
Lafon, 2004; Hein, 2005; Singer, 2006; Gine & Koltchinskii, 2006; Hein et al., 2007; Belkin & Niyogi, 2008; 
BeUcin & Niyogi, 2007). The basic result is that the limit of L'^f{x) for a; G 17 is (up to a constant ) 

^Llfix) A -AJ{x) = -^div[p'gTadf(x)] = -[A + -{Wp{x),W)]f{x) (4) 

where A^ is the weighted Laplacian and s = 2(1 — a). These papers deal with the analysis of graph 
Laplacians at an interior point of the manifold and do not deal with boundary behavior The exception to that 
is the analysis in (Coifman & Lafon, 2006), which includes manifolds with boundary, assuming the Neumann 
boundary conditions on the space of functions. Specifically, the Taylor series for the Gaussian convolution 
in (Coifman & Lafon, 2006, Lemma 9) involves a term containing the normal gradient at the boundary, which 
can be reformulated to obtain the limit for the graph Laplacian on the manifold boundary. However, there is 
no explicit discussion of the boundary behavior as well as its implication for learning in (Coifman & Lafon, 
2006). Discrete graph Laplacian is not considered in that work. We believe that given the popularity of graph 
Laplacians in machine learning, the boundary behavior of graph Laplacians deserves a more detailed study. 

In Section 2, we state some existing results on the limit analysis of the graph Laplacian as well as some 
necessary preparatory results, which will be useful for our analysis. Section 3 contains our main Theorem 2, 
which states that near the boundary, the graph Laplacian converge to the normal gradient and shows the 
scaling behavior and explicit rates of convergence. We also show how the scaling changes between the 
boundary and the interior points of the manifold. Numerical examples to support our analyses are provided 
in Section 4. Several important implications of the boundary behavior of the graph Laplacian are discussed 
in Section 5. 



yjtiX,,X,) = —e ^ (5) 



2 Technical Preliminaries 

In this section, we review the existing limit analysis of graph Laplacians LJJ, and on points away from 
the boundary of a compact submanifold. We also provide some technical results useful for our analysis in 
Section 3. 

Given an undirected graph representation of the random sample set X of size ri, the weight function with 
parameter t is defined as 

Notice that in this Gaussian weight function, the EucUdean distance should be used, instead of other distance, 
e.g., the geodesic on manifolds. It is this critical feature that on one hand makes the graph Laplacians 
computationally attractive, on the other hand has important impUcations, which will be discussed in the rest 
this paper 

Define the corresponding discrete degree function as 

1 " 

dt,n{X^) = -Y,m{X,,Xj) (6) 



Then we first normalize the weight function to obtain 

Note that this weight function also depends on the locations of Xi and Xj other than the Euclidean distance 
\\Xi — XjW^N . We use the three subscripts a,t,n to emphasize the related parameters. The corresponding 
discrete degree function is 

If the weight matrix for wt{Xi, Xj) is Wt^n and the corresponding degree matrix is Dt.n, then the normalized 
weight matrix is 

Wo^,t.n = D;-:Wt,nDi: (9) 

By finding the corresponding degree matrix Da.t,n, the unnormalized graph Laplacian is 

Ll.t^n = D^,t.n - W^,t,n (10) 

and the other two normalized versions are defined accordingly as ^ „ = / — nWa,t,n and ^ „ = 

For a fixed smooth function f{x), and any a: G O (including the samples and unseen points), define 
t nf{x) as the following, 

1 " 

i'a,t,„/W - - fix,)) (11) 

and similarly for the random walk normahzed graph Laplacian 

Ll.,.Ji.) = ^ -"'t' "i'y-' ' ^'"^» = /(.) - 1 1 ^f-^nx,) (12, 

da.t,n[X) n ^ da.t,n(X) 

ForL* J „, itcanbe shownthatL* J — D~ (^^L^ ( „F(2:) where = D~y^^f{x). Similar notions 

also apply to the degree functions. The intuition is that we treat vector {f{Xi), • • • , /(X„))^ as a sampled 
continuous function f{x) on ^l. As n oo, the vector becomes "closer and closer" to f{x). 

Three useful convergence results for the interior points will be needed in our analysis (Hein et al., 2007): 

dtAx) ^ Cip{x) (13) 

where Ci = /jj^ K{\\u\\^)du, and 

The following limit shows that the graph Laplacian on points that are away from the boundary converge to 
the density weighted Laplace-Beltrami operator with a proper rate of n and t. 

\Ll,tJ{^) ^ ^^^sf{x) (15) 

where C2 = /^d K{\\u\\'^)u\du. The limits of L"^ ^ nf{x) and j nfi^) '^an be found in (Hein et al., 2007) 
On a d-dimensional smooth manifold il, for an interior point x, thesmall neighborhood around x is locally 
equivalent to whole space R'', while for a point on the boundary of fi, i.e. x G dil, the small neighborhood 
around x is locally mapped into a half space defined as R"^ = {a; G R'', xi > 0}. This is a key fact that will 
be used in this paper. 

Next we will need a concentration inequality for the finite sample analysis of the graph Laplacian. 

Lemma 1 (McDiannid's inequality) Let Xi, • • • ,X„, Xi be i.i.d. random variables of M.^ from density 
p{x) G C°°{Ti), < a < p{x) <b<oo,\f\< M and f satisfies 

sup ,X„)-/(Xi,--- ,X„)| < c„ forl<i<n (16) 

then 



P(|/(Xi,... ,X„)-E[/(Xi,... ,X„)]| >e)<2cxp(-^^,^ (17) 



if! 



(18) 



3 Analysis of Graph Laplacian Near Manifold Boundary 

In this section, we analyze the hmits of the Laplacians ^ nfi^)' -^a t nfi^) ^^'^ t nfi^) when x is on 
or near the boundary of manifold fi. The argument roughly follows the lines of the convergence arguments 
in (Belkin, 2003; Hein, 2005; Coifman & Lafon, 2006). 

To fix the notation, in the rest of this paper, we use expressions without subscript n to indicate the corre- 
sponding limit as n — > oo, and expressions without subscript t to for the limits as i 0. 

wt{x,y)^ Kt{x,y) = -^K{x,y) 

_ 1 r^( \\-y\\lN , 1 ^irZ^ 

dt{x)= j-^wt{x,y)p{y)dy 

For smooth f{x) and p{x), 

Ll,tf{x) = [_w^,t{x, y){f{x) - f{y))p{y)dy = d^,t{x)Ll^J{x) (19) 
Jn 

and 

Ll,tf{x) = fix) ~ I ^^^j^^f{y)p{y)dy (20) 
Jn da,t(x) 

Similarly, tfi^) rewritten as tfi^) ~ '^^\^^ ix)L^ t-^i^) with F(x) = d~Y'^{x)f{x). 

Next we show the limits of the graph Laplacians on boundary point x £ dfl as i — s- and n od at a 
proper rate, when has a smooth boundary. 

Theorem 2 Let f e C^iU), \fix)\ < M, p{x) € C°°(n), < a < p{x) < b < oo, dfl be a smooth 
boundary of CI, x G 917, and t be sufficiently small, then for the unnormalized graph Laplacian L'^ ^ „ 

P{\-j.Ll,J{x) - > e) < 2cxp( ^) (21) 

for the random walk normalized graph Laplacian L"^ ^ „ 

1 f. „4-d+l 2 

P{\ Ll, J{x) - \-^d.f{x)\\ >e)< 2exp( (22) 



and for the symmetric normalized graph Laplacian 



a,t.n 



P(\^^Ll,J{x) ~ >e)< 2exp(-!^^) (23) 

where s = 2(1—0;), n is inward normal direction, Cq only depends on M, a, banda, C3 = 1/2 Jj^^ A'(||7ijp)(iM, 
and C4 ~ /jjd K{^u\^)u\du. 

Proof: We first show the limit of the expectation of L"^ J{x) as t — > in step 1 to 3. Then the limit of 
LJj (/(x) and j nf{x) can easily be found with the help of the limit of discrete degree function da,t{x)- 
At last, we obtain the finite sample results by applying Lemma (1). 

For a sufficiently small t, let ili be the set of points that are within distance 0{^/t) from the boundary dCl 
(a thin layer of "shell"), and ilo = Vl/Vti. We first show that for a small t, tf{x) is approximated by two 
different terms on i7o and ili, and more importantly these two terms have different orders of t. Then together 
with the limit of da.t{x), we can find the limit of t/(a;) and tfi^)- 

Step 1 : The key step for the limit analysis of graph Laplacians is the approximation on the manifold. 
Consider 

LlJix)= hjd§§£wifi-)-m)piy)dy ^^^^ 

= [dt{x)]-''J^Kt{x,y)[dt{y)]-''{f{x) - f{y))p{y)dy 



This integral is on the manifold fl. In order to study the the limit of this integral when t ^ 0, we can 
approximate the integral on an unknown smooth manifold by an integral on its tangent space at each point x 
when t is small such that the approximation errors of each step are comparable. For x E il, the tangent space 
is the whole space M'', while for x e dil, the tangent space is the half space M^j. [xi > 0). 

When y <E flis within an Euclidean ball of radius 0{t^^^) centered at x, in the local coordinate around a 
fixed x, the origin is point x, and let s = (si , • • • , s^) be the local geodesic coordinate of y,u ~ {ui , • • • ,Ud) 
be the projection of y on the tangent space at x. Then we have the following important approximation (see 
(Belkin, 2003, Chapter 4.2) and (Coifman & Lafon, 2006, Appendix B)). 

\\x-y\\m= Ml. + o{t^) (25) 

det(^)= l + 0(t) 

Step 2: Now we are ready to approximate each of the five terms in integral (24) when the integral is taken 
inside a ball centered at x having radius 0(t^/^) in j| • Urn norm. Notice that ||u||rc! ^ 0(t^/^). 

d-"{y) = - adi°'~\x)s'^Vdtix) + 0(8^) 

= d^^ix) ~ adt°'~^{x)u'^\/dt{x) + 0{t) 

,f{x) - f{y) = -s^VJix) - \s'^H{x)s + 0(.s3) (26) 

= -u^yj{x) - \u'^H{x)u + 0(^3/2) 

p{y) = p{x) + s^Vp(x-) + 0(s2) 

= p{x) + u^Vp{x) + 0{t) 

where H{x) is the Hessian of f{x) at point x. Notice that, the order inside of the big oh is determined by 
the larger one between the approximation error of u to s which is 0{t^^'^), and the Taylor expansion error 
on manifold as a function of s. The other observation is that, the order of the product of these terms is 
determined by the third term (f{x) — f{y)), the highest order of which is 0{t^^^), with the next ones as 0{t) 
and 0(^^/2). This means it is enough to keep the approximation terms up to order t^/^. 

Combing all the approximation together in a ball of radius 0{t^^^) around x, with a change of variable 
u t^/'^u, we can obtain t/(a;) 

LlJi-) = hji^i^im - f{y))p{y)dy 

= /nnB.(.) RgSfiF(/(^) - f(y))piy)dy + o{t^/') 

= "F7^ /onB,(.) " Vijg^){Viu^Vnx) + lu^H{x)u) 

{p{x) + Viu^S/p{x))]t^/^du + 0{t^/^) 

= /tw mu\\i,){Vt[^{u^^f{x))]+ 

t[ rf°(.) - — + llfik^ H{x)u]}du 

+ 0(t3/2) 

(27) 

where Bi {x) is a ball of radius 0{t^/'^) in |j • |juiv norm centered at x, while 82(2;) is a ball of radius 0{t^/^) 
in II • II Rd norm, and T{x) is the tangent space at point x. For a sufficiently small t, the first step replaces the 
integral over the whole 51 with ball Bi(a;), generating an error 0{t^^'^) (Coifman & Lafon, 2006, Appendix 
B). Then this integral is the same as the integral over a ball on the manifold fi. Finally, for an interior point 
X, T(a;) = W^, which means function /v (||u||^rf) is a even function of u. When taking the integral, the first 

term which has order is odd and therefore vanishes. Then the three left terms that are of order t inside the 



\ 

Figure 1 : Gaussian weight at x near the boundary. 



integral are exactly the weighted Laplacian at x, which is of order t. For a boundary point x, T{x) — R'j^. 
Next we study the interior points. 

Step 3: In Figure. I, x G ili (the "shell") is a point near the boundary, n is the inward normal direction, 
and —z is the nearest boundary point to x along n. In the local coordinate system, x is the origin, and 
along the normal direction the Gaussian convolution is from — z to +00, which is not symmetric. Therefore, 
if is not an even function in the normal direction, so the highest order term is the order 0(\/i) term. 

In this case, all the odd terms of Ui still will vanish in all directions except the normal direction n, and 
the most important point is that the leading term along the normal direction is of order \/t, while for interior 
points it is t. Next we assume ui is the normal direction. 

—LlJ{x)^~-^^p{x)dJ{x)j ■J J K{\\u\\la)uiduidu2---dud + 0{Vt) (28) 

where z is the distance to the nearest point of x on the boundary dil along the normal direction {z > 0) as 
shown in Figure 1. When t ^ 0, z ^ in the local coordinate system 

lim ^LIJ{^) = -^W)Y-'''dJ{x) (29) 

where Cy, ~ /^d if — l/2Ci, C4 — J^a if (||u|p)tii(iu. This result also needs the following 

limits, which generalize (Hein, 2005, Proposition 2.33) to points on the boundary. 

\imda.t{^) = \ (30) 

^ Cl^^"p^-^''{x),fOTXGdn 

Step 4: The normalized graph Laplacians can be obtained by normalization through da.t{x). Then the 
limit of the random walk normalized graph Laplacian is (we include the limit for interior point x for compar- 
ison) 

limt_o tfi^) = -^A,/(x), for.T e O 

(31) 

iimt^o ^,LlJ{x) = -§9„/(x), fora- e 90 

As for the limit of L^a.t.n^ it can be shown that L^a,t.nfi^) — ^a.t,nLa.t,nPi^) where F{x) ~ D^ l^f{x). 
Then the limit analysis follows easily. 
Step 5: Consider 

1 1 " 

^ i—l 

Notice that in the sum, different terms are not independent, since the degree da^t.n{x) and da,t.n{Xi) includes 
sums of all the random variables. Therefore, we need to use the McDiarmid's inequality in this step. The 
maximum change if we change a random variable is bounded by 

• ^ • 2Af (33) 

The maximum change happens when we move a point X, from a high density region with a minimum 
function value to a point Xi in a low density region with a maximum function value. Similar analyses apply 



to normalized graph Laplacians. Then We conclude the proof by applying the McDiarmid's inequaUty. ■ 

Notice that the error rate essentially comes from the McDiarmid's inequality. When a = 0, all terms in 
equation (32) are i.i.d., then we can use the Bernstein's inequality to obtain a better rate for L". For L'", 
an even better rate can be obtained as shown by (Singer, 2006). When a ^ 0, although strictly speaking 
the terms in equation (32) are not i.i.d., since da^t.n{x) is really an average of all the samples, it is almost 
a function of x alone, and da,t.n{Xi) a function of Xi alone. Then in this case, we believe it is possible to 
obtain a better error rate. 

Together with the existing analysis for interior points, we have the following implication of Theorem (2) 



^A,J(x), foix e n 



-^dnfix), foTxe dil 



(34) 



Therefore, the graph Laplacian converges to a different limit on a; e dfl from that on x G fi. More impor- 
tantly, these two limits are of different orders, one is 0{t) while the other is 0{\/t). However, in practice, 
when we apply the normalization step, we do not know where the boundary is, and always apply a global 
normalization j for all a; G 51 in order to obtain the weighted Laplacian in the limit. For a small t 



-g^^9„/(a;) + 0(l), foTxen^ 



(35) 



Notice that the error only happens on a "shell" Qi having volume 0{y/t). For f{x) such that daf{x) ^ 
on the boundary point x with enough data points, we have that for small values of t 



(36) 



4 Numerical Examples 

In this section, we explore the boundary behavior of the graph Laplacian by studying numerical examples. 



See Panel (b) on the right 



(a) Ti^,t,„/(a;) over [1,2] (b) \U^.t,J{x) over [LI, L9] (c) log(i|i^,^,„/(2)|) vs log(f) 

Figure 2: jL^ t.nfi^) with f(x) = x^ over [1, 2]. 



The values of j nfi^) with a = for 1000 points 
^ are shown in Panel (a) in Figure 



4.1 Graph Laplacian on the Boundary 
Example 1. We take H = [1, 2], and f{x) = x 

sampled from a uniform distribution (equal-spaced points) and t = 10 
(2). As expected, we see that the values at the boundary are much larger than those inside the domain and are 
consistent with —{x^)' = — (the value at 2 is roughly 4-times of the value at 1) up to a scaling factor-'. 

In Panel (b) we show the interior [1.1, 1.9] of the interval where the function is indeed the Laplacian 
~{x^)" = —6x up to a scaling factor. In Panel (c) we analyze the scaling of the graph Laplacian on the 
boundary as a function of t in the log-log coordinates. We see that log \ \L'^ t is close to a linear 

function of \og{t) with slope approximately —5 as you would expect from the scaling factor 



^The positive value at a; = 2 is the result of the normal direction pointing inward (left). 



Example 2. Next we analyze the boundary behavior for a simple low dimensional manifold. Let fl 
be half a unit sphere {z > 0), which is a 2-dimensional submanifold in R'^. The boundary is a unit circle 
{(a;, y, z) : {x^ + = 1, z = 0)}. We take /(cc, y, z) = xz, then the negative inward normal gradient on the 
boundary is 

-9„/(x) = -(z,0,a;)(0,0,lf = (37) 

where (z, 0, x) is the gradient of /(x, y, z), and (0, 0, 1) is the inward normal direction. This means the 
negative normal gradient of / along the inward normal direction on the boundary of a half sphere is a linear 
function in x with a negative slope. 

We generate a uniform sample set of 2000 points on a half sphere and compute a vector g = -ij L ^ „ / (X ) 

with a ~ Q,t ~ 0.5. We pick the set B = {(a;, y , z) € X\Q < z < 0.05} to be points near the boundary. The 
dependence between ■^L'^a t nfi^i 2/i ^) ™d ^ for 2000 data points sampled from the uniform distribution 
on the half sphere is plotted in Figure (3) and is consistent with our expectation. 



Figure 3: -^L^ t nfi^) '^he boundary of a uniform half sphere, with t = 0.5. 

To provide a more rigorous error analysis we compute the mean square errors for several values of t and 
sample size n. The results are shown in Table (1). We see that the errors are relatively small compared to the 
values of the gradient and generally decrease with more data. 



Table 1: Mean square errors between the analytical normal gradient —dnf{x,y,z) = —x and 
-i=LJ^ J „f{x, y, z) on the boundary of a half sphere with f{x, y, z) — xz. 



n\t 


64/64 


32/64 


16/64 


8/64 


4/64 


2/64 


1/64 


500 


0.0090 


0.0059 


0.0071 


0.0136 


0.0287 


0.0500 


0.0725 


1000 


0.0089 


0.0048 


0.0033 


0.0061 


0.0121 


0.0294 


0.0627 


2000 


0.0113 


0.0076 


0.0073 


0.0068 


0.0159 


0.0356 


0.0585 


4000 


0.0083 


0.0044 


0.0036 


0.0044 


0.0072 


0.0189 


0.0516 



4.2 Comparison to Numerical PDE's 

From previous analysis we can see that, for graph Laplacians, the "missing" edges going out of the manifold 
boundary on one hand can be seen as being reflected back into fi, which is particularly intuitive in symmetric 
/cNN graphs, see e.g., (Maier et al., 2009), on the other hand, it can be seen that function values on edges 
going out of Q. are constant along the normal direction. The latter view is commonly used in schemes of 
numerical PDE's in finite difference methods for the Neumann boundary condition, see e.g., (Allaire, 2007). 

We use an example in to show how the Neumann boundary condition for a Laplace operator on a 
regular grid is implemented in finite difference method, which we hope can shed light on the graph Laplacian 
on random points. The Laplace operator in is A/ = ~d^f ~ dyf, and the regular grid near the boundary 
is shown in Figure (4). Since we can separate the Laplacian into partial derivatives of different dimensions, 
the discrete Laplace matrix L on the regular grid with the Neumann boundary condition near xq along x 
direction can be shown to be 

1 -1 ••• " 

-1 2-1 ••• 
0-1 2-10 ••• 



Xo 

L = xi 

X2 



8D. 

Y 



yi 



Figure 4: Regular grid in 



where we only connects points that are next to each other Along y direction the Laplace matrix elements 
near xq are [• • • — 1 2 — 1 • • • ]. Let the distance between data points be h. Consider point .tq on the 
boundary, along y direction, we have a 3-point stencil along y axis, j/o = XQ,yi, which is enough to 
define d'^f{xo)- 



lim ^Lfiyo) 



lim 



fiy-i)-2f{yo) + f{y,) 



This is also true for points that are in the interior fl, e.g. xi, X2, etc. However, for xq on the boundary along 
X direction (normal direction at xq) we only have two points 



1 

ro 7^^ 



lim —Lf{xo) = — lim 



f{xi) - f{xo) 



w? , - lim 

h'^ /i-i-o h 

This shows that Lf{x) "converge" to j^daf{x) for x on the boundary while to Af{x) for x inside of the 
domain, with a different scaling behavior. This is almost the same as what happens to the graph Laplacian 
on random samples. Notice in numerical PDE's we also have that if dxf{xo) 7^ 0, as /i 0, 



dx.f{xa) 



h 



00. 

In fact, if we construct the graph Laplacian matrix D — Vt^ by setting Wij = 1 if two points are next to each 
other and wij ~ otherwise, on two dimensional grid as shown in Figure (4), the graph Laplacian matrix is 
the same as the Laplace matrix with the Neumann boundary condition in numerical PDE's. 

In order to let Lf{x) converges to A/(a;) for all x on domain fl with a single normaUzation term, we can 
add a 'fictitious' point x^i along the normal direction, and let = f{xo). Then as h ^ we have 



1 



Lfixo) 



fix,) - fjxo) _ - 2f{xo) + fix,) 



-dlfix^) 



Together with y direction, we have Vx G ft, limii^o Lfix) — —d^fix) — dyfix) — Afix). Condition 
= fixo) then becomes 

fix-i) - fixo) 



lim 



dec fixo) = 



which is the Neumann boundary condition. This method is used to implement the Neumann boundary con- 
dition in finite difference methods for PDE's, see e.g., (Allaire, 2007, Chapter 2). 

The graph Laplacian can be seen as an implementation of the Neumann Laplacian on random points, 
which generalizes regular grids to random graphs based on random samples. This also means by construction, 
the graph Laplacian is a Neumann Laplacian, which is a built-in feature of the graph Laplacian. 

5 Discussions and Implications 
5.1 Neumann Boundary Condition 

Our analysis of the boundary behavior suggests that the eigenfunctions of both and L" as well as solutions 
of certain regularization problems should satisfy the Neumann boundary condition. However, this is not true 
for the symmetric normalized Laplacian L^. 

Unnormalized and Random Walk Normalized Graph Laplacians: These two versions of graph Lapla- 
cians only differ in the density weight outside of the normal gradient, so we only need to find the boundary 
condition for one of them. Let (fn (a;) and Xi be the i*'' right eigenfunctions of L^, then for any positive integer 
i and x e dil, the following Neumann boundary condition holds. 



dn(t>iix) = 



(38) 



This is also true for the eigenfunctions of L", the limit of the unnormaUzed graph Laplacian, which can be 
seen as follows. All the eigenfunctions should satisfy 

t^o t 

On the boundary 

lim -rLa,tMx) ^ - Jim -j=dn(j}i{x) 

t-i-O t ' t-i-O y/t 

Since \i(j)i{x) < oo for all x G fl, if (j)i{x) does not satisfy the Neumann boundary condition, then 
limf^o t'Pii^) — >■ oo on the boundary, therefore such 4>i{x) can not be the eigenfunctions of L"^. This 
implies that all the eigenfunctions of should satisfy the Neumann boundary condition, i.e., Vi, da(f>i{x) = 
for X e d^l. Similarly, this is true for the unnormalized graph Laplacian with a bounded density. 




We numerically compute the second and third eigenfunctions of ^ „ with a = 1/2 in Figure (5)^. The 
left panel shows the eigenfunctions over interval [0, 1] with a uniform density, while the right panel is for a 
mixture of two Gaussians. As the numerical results suggest, the second and third eigenfunctions of the graph 
Laplacian satisfy the Neumann boundary condition, and this is also true for other eigenfunctions. In fact for 
a uniform over [0, 1], the Neumann eigenfunctions for the Laplacian are cos(fc7ra;) where k = 0, 1, 2, • • • . 
The second and third eigenfunctions correspond to cos(7r.T) and cos(27ra;), up to a change of sign, which is 
consistent with the numerical results in the left panel of Figure (5). 

Symmetric Normalized Grapli Laplacian: For the symmetric normalized graph Laplacian ^ „, the 
Neumann boundary condition does not hold for its eigenfunctions in the limit. This can be shown by a one to 
one correspondence between the eigenfunctions of j „ and ^ „. Let the eigenvectors for ^ „ be ipi, 
and the right eigenvectors for LJ^ ^ „ be (pi, then 

This is true for any sample size and any parameter t. Therefore, in the limit 

dnM^) = dn[d^/^{x)Mx)] = 0,(x)a„di/2(x) 



''The Neumann boundary condition also holds for other a. 



Near the boundary along the normal direction, degree function d{x) decreases as a result of the asymmetric 
interval for J-^Kt{x, y)p{y)dy, so tends to be "bent" towards zero near the boundary. Since how the 

graph is constructed will determine what the degree function will be, the boundary behavior also depends on 
what graph is used. We test two graphs, eNN graph and symmetric fcNN graph, which are studied by (Maier 
et al., 2009) for clustering. In the left panel of Figure (6), d{x) is scaled and shifted to fit the plot and an eNN 
graph is used. For x near the boundary, the degree function decreases as a result of having less points in the 
fixed radius neighborhood. Therefore, ip2{x) is "bent" towards zero. 

Notice that for the symmetric A;NN graph, in the right panel of Figure (6), the eigenfunctions can have 
"bumps" near the boundary. This is the result that for symmetric /cNN graphs, the edges going out of the 
boundaries are "reflected" back. For example consider /c = 10 in a symmetric fcNN graph, with the distance 
between points set as 0.01, for point a; = 0, its nearest neighbors are x = 0.01, 0.02, • • • , 0.1. However, 
for point x = 0.1, x = is not in its k nearest neighbors. This means the constructed graph will not be 
symmetric. If we add eji for every asymmetric edge e^ , then although the graph is symmetric now, d{x) for 
point X = 0.1 will be much larger than other points. As t decreases, this "bumps" will shift to the boundary. 




(a) On an eNN graph. (b) On a symmetric fcNN graph. 

Figure 6: tp2{x), 4>2{x) and d{x) for uniform over [0, 1]. 



5.2 Limit of Graph Laplacian Regularizer 

The following graph Laplacian regularizer is a popular penalty term in many semi-supervised learning algo- 
rithms when a = 0. 

This limit is studied in (Hein, 2005, Chapter 2) without considering the boundary, and is also studied on 
by (Bosquet et al., 2004), which has no boundary and is not a low dimensional manifold either From 
Theorem (2), we see that the graph Laplacian has a limit of different scaling behavior on the boundary points 
compared to the interior points. This leads to the question of the limit of the graph Laplacian regularizer 
t nf when it is defined on a compact submanifold with a smooth boundary. Based on the quadratic 
form, we use a similar method as the proof of Theorem (2) to obtain the next theorem. 

Theorems For a fixed function f{x) e C^{il), and let p{x) G C°°{n), < a < p{x) < b < oo, and the 
intrinsic dimension offl be d, then as n ^ oo, t — > and n "> OO, 

Yiui \fLl^J [ \\Vf{x)\\^[p{x)]^-^'-dx, in probability (40) 

where C = l^T:d{i/2-a) _ 

Proof: Following the proof of Theorem (2), let fix be a thin lay of "shell" of width O(v^), and VLq ~ Vl/Vli. 
For a fixed t, consider the quadratic form (39), the limit as n oo is 



(42) 



Then by the approximation on manifolds (25) and (26), for a fixed x G il, 

^p{x)dr{x) !^K{'^)dr{y)U{x) - f{y)?p{y)dy 

= V/(x-)f /^(^, K{tf)uldu + 0(t3/2) 

= V/(x)|p /^(^) KiWurWidu + 0{t''^) 

Notice that in this case, the highest order is controlled by {f{x) — f{y)Y, which is of order 0(t), and T(x) 
is the tangent space at a;. Then for any point x e O, the limit is 

^^||V/(.)inp(.)]— (43) 

where 



(44) 



Ci {x) ^ • • • /^^ /+f e-lt-lt 

with z as the distance between x and the nearest boundary point along the normal direction. 
On flo, for a small t, we can replace z with oo, then the integral on Hq is 



2Cf 



Vf{x)\\'[pix)Y-"'dx (45) 



where the coefficient becomes a constant independent of x. 

On the shell fii, as i — > 0, the shell shrinks into a set with measure zero. As long as the function inside 
the integral is bounded, the integral on fii will also be zero. For x G fli, we have < ^ < +oo and 

i7r''/2 < C2ix) = i7r'^/2(l - + erf(z)) < tt'^/^ 

(46) 

i7r''/2 < Ci(x) = ^n'^/^il+CTfiz)) < tt'^/^ 

For / G C^(ri) and any x e Q,m any direction, | ^^^^^ | < oo. The density p{x) is also bounded, therefore, 

the integral on Jli is zero as f ^ 0. Overall, as i — > 0, C2{x) C2, Ci{x) -> Ci, and JIq becomes fl, 
therefore. 

Next consider 

Since all the n? terms in the sum are not i.i.d., we use the McDiarmid's inequality. The maximum change of 
replacing one random variable is 

therefore, 

P{\\fLl,tJ-ci\\yf{x)f[p{x)]''-^''dx\ >e)< 2e-T^ (50) 

We conclude the proof by plugging in Ci — tx'^I'^ and C2 — \'k'^I'^ . ■ 

One important implication of this theorem is that, in order to use a gradient penalty term w.rt. to different 
density weights in the form of (.t) in the limit, we can use f'^ t nf with different a values. For instance, 
to obtain \\W f {x)\\^p{x)dx instead of having p'^{x) as the commonly used penalty term f'^L^f, we can 
set a = 1 /2. This penalty then fits the fact that sample Xi are drawn from density p{x). 

We tested Theorem (3) numerically on several functions with a uniform density over 1-dimensional in- 
terval [0, 1]. 1001 equal-space points are generated, the value of t is between 1 and 10"'', and we compute 



Table 2: Coefficient test for different functions. The largest value for different t of each function is reported. 



a\f{x) 


/ 1 -1 

Vx + l 


X 


+ lOx 


x-^ 




sin(a;j 


cos(x) 


cos(10a;) 


1 fl /2 — n) 
4 " 





0.4424 


0.4424 


0.4424 


0.4420 


0.4423 


0.4424 


0.4423 


0.4426 


0.4431 


1/2 


0.2497 


0.2497 


0.2497 


0.2497 


0.2497 


0.2497 


0.2497 


0.2497 


0.2500 


1 


0.1411 


0.1411 


0.1411 


0.1412 


0.1411 


0.1412 


0.1410 


0.1411 


0.1410 


-1 


1.3845 


1.3846 


1.3846 


1.3819 


1.3840 


1.3827 


1.3865 


1.3859 


1.3921 



the numerical graph Laplacian approximation -p^f'^L'^ t „/, and the analytical value of j]^ \f'{x)\'^dx. The 
ratios of the two are reported in Table (2) and the coefficient plots as a function of log(<) are shown in 
Figure (7). 

In Figure (7), eight different functions from Table (2) are tested for the coefficient C using different 
bandwidth t. Each plot corresponds to one fixed a value. As suggested by the figure, in Table (2), the 
maximum values from different t are reported. From both the figure and the table we can see that, the 
numerical results are close to the theoretical coefficient ivr^/^^". The figure also suggests a numerically 
stable patterns as t decreases, until it is too small and loses numerical precision. 





Figure 7: Coefficient C as a function of t for different functions. The solid line is the theoretical result, and 
the X axis corresponds to — log(t). 



Notice that on finite samples, it is also possible to treat f'^L'^ j „/ as an inner product as (/, ^ „/). 
However, in the limit, as we see in this paper, L^!^ j „/ can degenerate to an unbounded value on the boundary. 

Although this behavior only happens on a small part having volume 0{t^^^) and the degenerating behavior 
scales as t^/^, they cannot cancel out each other since we can not bring the limit t ^ across the integral 
when the sequence of functions inside of the integral have an unbounded limit, i.e., it violates the condition of 
Dominated Convergence Theorem. When / satisfy the Neumann boundary condition or fl has no boundary. 



then it is safe to compute the hmit as an inner product, which is essentially the Green's first identity. 



5.3 Reproducing Kernels and Boundary Effects 

In the short discussion below we would like to illustrate the importance of boundary effects in a simple 1- 
dimensional setting. Consider the regularizer /-^L"/, whose limit for a fixed / in has the following 
expression (Bosquet et al., 2004). 

Jif) = I ||V/(x)||V(a:)dx 

In M}, the subspace orthogonal to the null space of J(/) is a reproducing kernel Hilbert space (RKHS) 
(Nadler et al., 2009). 

Consider its reproducing kernel function K{x^ y). Over the unit interval [0, 1] with uniform probability 
density the kernel function can be found explicitly by eigenfunction expansion of the Green's function of 
the weighted Laplacian (Green's function in this case is the same as kernel K{x, y)). Using the Neumann 
boundary condition, we find that the reproducing kernel (in the subspace orthogonal to the null space) has the 
following expression: 

oo ^ 

K{x, y) — — — — cosik-Kx) cos( kny) (51) 
t[ ^^^^ 

On the other hand, in (Nadler et al., 2009) the kernel without boundary conditions is shown to be 

K'{x,y) = -^-]^\x-y\ (52) 

which is a different function. 

In order to test our analysis, we notice that in finite sample case the discrete Green's function of LJ^ ^ „ is 
the same as the reproducing kernel functions for space {/ : f"^ t „/ < oo}, which is the pseudoinverse 
of the matrix ^ „ (Berlinet & Thomas-Agnan, 2003, Chapter 6). In Figure (8), we compute and plot the 
kernels numerically to verify the above analysis. On one hand, we use the eigenfunction expansion (51) 
to find the kernel. On the other hand, we build the graph Laplacian matrix and compute its pseudoinverse 
to obtain the approximate kernel function. As shown in Figure (8), we can see that the kernel obtained by 
eigenfunction expansion analytically is very close to the one obtained from the pseudoinverse of the matrix 

f „ (up to a constant scaling factor). This kernel is quite different from K'. The difference is due to 
the global effect of the boundary behavior of the graph Laplacian and provides additional evidence for the 
Neumann boundary condition in eigenf unctions. 



(a) Kernel K by eigenfunction ex- (b) Kernel K by pseudoinverse of the (c) Kernel K' 

pansion graph Laplacian 

Figure 8: Kernels at 0.25 over [0, 1] in subspace that is orthogonal to the null of ^ 
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